{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Implenting Custom Classifier\n",
    "\n",
    "In this example, we implemented our first custom K-Nearest Neighbors and then we compare it accuracy with the sklearn KNN classifier. For this we are using IRIS dataset and detail about the IRIS dataset can be found [here](http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html)\n",
    "\n",
    "### Iris Dataset \n",
    "The Iris dataset is consists of 3 different types of irises (Setosa Versicolour ad Virginica) petail and sepal length stored in a 150x4 numpy,ndarray. More detail can be [found here](https://en.wikipedia.org/wiki/Iris_flower_data_set)\n",
    "\n",
    "So, we have 4 features ( Sepal length, Sepal Width, Petal Length, Petal Width) and we have one Output (Species), it is the name of Iris or in terms of machine learning, the name of the class to which it belongs\n",
    "\n",
    "#### Importing Dataset \n",
    "In the first step, we are importing the iris dataset and then partitioning it into test and training dataset\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from sklearn import datasets\n",
    "iris = datasets.load_iris()\n",
    "\n",
    "# X = inputs for the classifier\n",
    "X = iris.data\n",
    "#print (X)\n",
    "\n",
    "# y = ouput \n",
    "y = iris.target\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# We can either manually partition dataset into test and training dataset or either use cross validation\n",
    "from sklearn.cross_validation import train_test_split\n",
    "\n",
    "#help(train_test_split)\n",
    "# Using half of the dataset for testing\n",
    "X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.7)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## K-Neighbors Classifier\n",
    "#### 1) Creating custom KNN classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import random\n",
    "from scipy.spatial import distance\n",
    "\n",
    "def getdistance(point1 , point2):\n",
    "    return distance.euclidean(point1, point2)\n",
    "\n",
    "class CustomKNNClassifier():\n",
    "    def fit(self,X_train, y_train):\n",
    "        self.X_train = X_train\n",
    "        self.y_train = y_train\n",
    "    \n",
    "    def predict(self,X_test):\n",
    "        predictions = []\n",
    "        for row in X_test:\n",
    "            label = self.closestlabel(row)\n",
    "            predictions.append(label)\n",
    "        \n",
    "        return predictions\n",
    "    \n",
    "    def closestlabel(self,row):\n",
    "        best_index = 0\n",
    "        best_dist = euc(row,self.X_train[0])\n",
    "        \n",
    "        for i in range(1, len(self.X_train)):\n",
    "            dist = getdistance(row,self.X_train[i])\n",
    "            if best_dist > dist:\n",
    "                best_dist = dist\n",
    "                best_index = i\n",
    "        \n",
    "        return self.y_train[best_index]\n",
    "\n",
    "    \n",
    "    \n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2, 1, 1, 1, 1, 2, 2, 2, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 2, 1, 0, 1, 0, 1, 2, 1, 0, 1, 0, 1, 0, 1, 1, 2, 0, 2, 2, 0, 0, 1, 1, 2, 2, 0, 1, 0, 1, 2, 1, 2, 2, 1, 1, 1, 1, 0, 2, 0, 0, 2, 2, 2, 2, 2, 0, 2, 2, 1, 2, 2, 2, 2, 0, 0, 1, 1, 1, 0, 2, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0, 0, 0, 2, 2, 1, 1, 1, 0, 0, 0, 0, 1, 1, 2, 2]\n"
     ]
    }
   ],
   "source": [
    "my_classifier = CustomKNNClassifier()\n",
    "my_classifier.fit(X_train, y_train)\n",
    "\n",
    "\n",
    "predictions = my_classifier.predict(X_test)\n",
    "print (predictions)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.93333333333333335"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Checking accuracy of the classifier\n",
    "from sklearn.metrics import accuracy_score\n",
    "accuracy_score(y_test,predictions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2) Using Sklearn KNN "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.93333333333333335"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "classifier = KNNClassifier()\n",
    "classifier.fit(X_train, y_train)\n",
    "\n",
    "predict = classifier.predict(X_test)\n",
    "# Checking accuracy of the classifier\n",
    "from sklearn.metrics import accuracy_score\n",
    "accuracy_score(y_test,predict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summary\n",
    "#### Pros:\n",
    "1. Relatively easy and simple to implement and understand\n",
    "\n",
    "#### Cons\n",
    "1. Computationally intensive\n",
    "2. Relationships between the features are hard to represent\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [conda root]",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
